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Abstract 

In this paper, we propose a methodology quantifying 
temporal patterns of nonlinear hashtag time series. Our 
approach is based on an analogy between neuron spikes 
and hashtag diffusion. We adopt the local variation, 
originally developed to analyze local time delays in 
neuron spike trains. We show that the local variation 
successfully characterizes nonlinear features of hashtag 
spike trains such as burstiness and regularity. We apply 
this understanding in an extreme social event and are 
able to observe temporal evaluation of online collective 
attention of Twitter users to that event. 

Introduction 

Hashtag diffusion in Twitter social network is nonlinear in 
time. Pairwise or higher order temporal correlations, bursts, 
and regular patterns are observed in data analysis. The dis¬ 
tribution of time delays between two successive hashtag ac¬ 
tivities gives a power-law scaling with fat tails (Domenico 
et al. 2013), on the contrary to an exponential distribution 
suggested for an independent Poisson process. A potential 
reason addressed is that earlier hashtags influence coming 
hashtags such that past hashtags can both cooperate and 
compete with present hashtags (Myers and Leskovec 2012; 
Coscia 2013). Heterogeneity of individual online user be¬ 
havior in micro scale and self-organized cascades (Cheng et 
al. 2014) due to unequal selection (Ratkiewicz et al. 2010; 
Weng et al. 2012; Gleeson et al. 2014; Coscia 2013; Cetin 
and Bingol 2014; Gleeson et al. 2015) in the hashtag pool 
in macro scale, and the underlying cyclic rhythm of twit¬ 
ting habit (Myers and Leskovec 2014; Franca et al. 2014; 
Mollgaard and Mathiesen 2015; Sanli and Lambiotte 2015) 
are further factors driving time-dependent hashtag propaga¬ 
tion. Although preserving highly nonlinear nature, building 
tools to characterize hashtag time-series, except obtaining 
the distribution functions, has not been considered in detail, 
yet. 

Extreme social events such as elections 
and protests (Borge-Holthoefer et al. 2011; 
Gonzalez-Bailon et al. 2011), announcement of scien¬ 
tific innovations (Domenico et al. 2013), and panic events 
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Figure 1; Hashtag spike trains of ^ledebat in different days 
covering extreme social events such as the debate of the 
French presidential election-2012 held on May 2 and the 
election held on May 6, and a regular day between them, e.g 
May 4. The upper row represents the dynamics in the debate 
day. Collective attention during the debate gives tremendous 
amount of activity on the hashtag and so we observe a con¬ 
tinuous series on the contrary to the distinguished spikes be¬ 
fore 4 pm. The middle row is for a regular day after the de¬ 
bate, followed by the spike train in the election day in the 
below row. A decay in the activity on the hashtag ^ledebat 
is visible from the top to the bottom and the process suggests 
highly nonlinear characteristics in each day. 


such as crisis (Kenett et al. 2014) and earthquakes (Sasahara 
et al. 2013) artificially deform Twitter network and en¬ 
courage massive amount of hashtag activity in a short time 
window, as shown in Figure 1. The resultant emergent on¬ 
line behavior is both empirically (Yang and Leskovec 2011; 
Mollgaard and Mathiesen 2015) and theoretically (Moll¬ 
gaard and Mathiesen 2015) studied and distinct temporal 
properties of collective attention are quantified. These 





























































































properties are significantly important to be able to predict 
these extreme, but rare social events (Kenett et al. 2014; 
Miotto and Altmann 2014). 

Our main motivation is to establish a systematic method¬ 
ology to distinguish real noisy hashtag signals to indepen¬ 
dent random signals and to extract temporal patterns from 
the real signals. We apply an approach called the local varia¬ 
tion Lv, originally introduced to analyze noisy neuron spike 
trains and to detrend for salient dynamics of neurons (Shi- 
nomoto, Shima, and Tanji 2003; Miura, Okada, and ichi 
Amari 2006; Omi and Shinomoto 2011). After convincing 
the usage of Ly in semantic analysis, which has been per¬ 
formed extensively in our recent work (Sanli and Lambiotte 
2015), we present a promising study on evaluation of col¬ 
lective attention by performing Ly on a political election. 
Remarkable difference in Ly in rush period suggests that 
local nonlinear features could predict extreme social events. 

Data Set 


Data Collection 

The data is collected via publicly open Twitter API. A fine 
time window, between April 30, 2012 and May 10, 2012, 
is determined on purpose to be able to cover two social 
events such as the political debate on the French presiden¬ 
tial election-2012 held on May 2 and the election day held 
on May 6. Having 10 days data helps us to visualize activity 
in regular days, both between and after these extreme events, 
and compare the difference in hashtag dynamics. During this 
period, all twitting activity, but only the users addressed in 
France is considered not to deal with time differences be¬ 
tween countries and regions and other potential social events 
held on in the same period. The time resolution is 1 second 
and no language selection is applied. 

We examine 295,697 unique hashtags out of 2,942,239 
tweets include at least one hashtag, which is 30% of 
all tweets. 228,525 online users, almost half of the to¬ 
tal online users, are associated with hashtag diffusion. 
The network in the period contains hashtags directly re¬ 
lated to the debate, election, and two candidates Fran¬ 
cois Hollande and Nicolas Sarkozy for the presidency of 
France. Ranking them by the number of appearance (fre¬ 
quency) or equivalently popularity p, from the most pop¬ 
ular to the least, we have ^^(Hedebat (180946), #^hollande 
(143636), #sarkozy (116906), #votehollande (99908), 
#avecsarkozy (67549), #ledebat (66668) [in French], 
:#france2012 (20635), ^presidentielle (13799), and many 
others with lesser p. The numbers inside the parenthesis 
present the corresponding p. These popular hashtags are at 
the top of the others in the pool, e.g. 0.0001% of all hash- 
tags. 

Real Hashtag Spike Trains 

Single hashtag diffusion in time can be represented as a 
spike train, as shown in Figure 1. Each spike represents that 
the corresponding hashtag used at that time without spec¬ 
ifying ways and users. Having the resolution 1 second, the 
spike time of multiple events occurring in a second cannot be 


distinguished and therefore in this situation only one appear¬ 
ance is counted. We construct spike trains for all hashtags 
observed in the data ordering from the earliest appearance 
time to the latest time, e.g. ..., Ti-i, Ti, r^+i, .... Each 
hashtag has a unique number of (exact) appearance, popu¬ 
larity p. 

Randomized Hashtag Spike Trains 

To be able to compare real dynamics with an artificial and in¬ 
dependent one, the randomized version of real hashtag spike 
trains is established serving as a null model. Eirst, all spikes 
coming from any hashtags are combined, giving a single 
merged hashtag spike train. Uniforming spike appearance, 
one spike at a spike time, is still valid. Children randomized 
hashtag spike trains are obtained by uniformly permuting the 
matrix T of the spike times of the merged train by p times, 
the number of spikes of the desired real train we compare. 
We apply randperm(r, p) in Matlab and have p times uni¬ 
formly distributed unique independent random spike times, 
e.g. ..., tI, .... 


Local Variation 

The local variation Ly, specifically defined to quantify non¬ 
linear neural time-series and to uncover temporal patterns 
in neuron spike trains, is defined at spike time r* (Omi and 
Shinomoto 2011) 


Ly 


3 f ^Ti+1 — ArW 

N-2 ^ IvAt.+i + AtJ ’ 


( 1 ) 


Ari+i = Ti+i - Ti and Ar^ = n - n-i. An+i quantifies 
forward delay and Ar^ represents backward waiting time. 
Importantly, the denominator normalizes the quantity such 
as to account for local variations of the rate at which events 
take place. 

By definition, Ly takes values in the interval [0;3]. Eur- 
thermore, it is derived that Ly is on average equal to 1, {Ly) 
= 1, if the underlying process described by an independent 
Poisson distribution, which the distribution of the inter-spike 
intervals gives an exponential function (Shinomoto, Shima, 
and Tanji 2003). Here, the brackets describe the average 
taken over the given distribution. All other situations can 
be generalized by Gamma processes (Shinomoto, Shima, 
and Tanji 2003; Miura, Okada, and ichi Amari 2006) and 
(Ly) should be significantly different than 1. Eor instance, 
(Ly) « 3 if the hashtag spike trains are extremely bursty 
(irregular), on the other hand (Ly) « 0 while the trains 
present regular (homogeneous) temporal patterns (Sanli and 
Lambiotte 2015). 

Eigure 2 shows the results of our Ly analysis, for both 
real and randomized hashtag spike trains. The probability 
distribution of P{Ly) of the calculated values of Ly on the 
two data sets, with classifying hashtag groups in popular¬ 
ity p, presents distinct behavior. Whereas {Ly) = 1 for any 
groups of p for the randomized trains, suggesting Poisson 
processes, {Ly) never indicates 1 for the real trains. The 
randomization dampens nonlinearity of the real trains, tem¬ 
poral correlations, burstiness, and regularity in series and 
construct statistically stationary and independent processes. 
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Figure 2; Probability density function of local variation Ly, 
P{Lv), of hashtag spike trains (Sanli and Lambiotte 2015). 
(a) Real hashtag spike trains. We observe a clear shift, to the 
higher values of Ly, in the peak positions while decreas¬ 
ing hashtag popularity p, which indicates that the process 
becomes bursty (irregular). In any p, the mean values never 
gives 1, none of the real signal is Poisson process, (b) Ran¬ 
domized hashtag spike trains. Independent of p, all curves 
suggest fluctuations around 1, as expected for temporarily 
independent signals. To satisfy a better visualization, the re¬ 
sults are grouped based on ranking p from the most popular 
to the least popular ones: High p, red and orange symbols, 
moderate p, yellow and green symbols, and low p, blue and 
purple symbols. 


yet time-dependent events. Therefore, we characterize time- 
dependent Poissons in Figure 2(b), P{Ly) fluctuates around 
1. However, all nonlinearities are present in the real data, 
and so in P{Ly). Describing regular patterns for popular 
hashtags (high p), red and orange symbols, the trains be¬ 
come bursty (irregular) due to local temporal correlations 
for moderate, yellow and green symbols, and for low popu¬ 
larity, blue and purple symbols. The trend is captured in the 
shift of the peak positions of P{Ly) from small Ly to large 
Ly decreasing p in Figure 2(a). Consequently, we And that 
not only for neurons but also for hashtags Ly is a success¬ 
ful tool to characterize salient dynamics in nonlinear social 


Empirical Application: Collective Attention 

We now utilize Ly for more practical purposes and ask: Can 
Ly predict extreme social events? Our investigation will be 
presented below is far from a complete understanding. How¬ 
ever, we will be able to capture temporal evaluation of on¬ 
line emergent behavior as a result of collective attention of 
twitting on the French presidential election-2012, in the first 
week of May 2012. 

We specifically compare hashtag diffusion in extreme 
days, the debate day (May 2) and the election day (May 6) 
with the dynamics in a regular day between these events, 
e.g. May 4. Instead of considering all hashtags in the pool, as 
done in the previous Section, we concentrate on topic related 
hashtags such as ^ledebat (180946), ^hollande (143636), 
#sarkozy (116906), #votehollande (99908), #avecsarkozy 
(67549), and ^ledebat (66668) [in French]. The numbers in 
the parenthesis indicate p of the corresponding hashtag. 

Local variation Ly is obtained for these topic-oriented 
hashtag spike trains. The trains are constructed separately 
for the three days. Ly for each train and for each day is cal¬ 
culated considering time window with duration 1 hour. Fig¬ 
ure 3 presents the results in the debate (left), regular (mid¬ 
dle), and election (right) days. The top row [Figure 3(a)] 
shows Ly (t) in the days in hour resolution. The below row 
[Figure 3(b)] summarizes the twitting activity as the tweets 
including listed hashtags in the legend versus time, again in 
hour resolution. 

Rush hours in online communications during the debate 
and the announcement of the election result are highlighted 
in the shaded yellow rectangle and with the yellow vertical 
line, respectively. Significant decays in Ly{t) for both the 
debate and election days, synchronizing perfectly with the 
peak of the counts, indicate regular activation of the online 
users on the discussion of the election and so describe no 
burstiness, Ly (t) « 0. This trend is not observed at all for 
the regular day and mainly the cyclic rhythm of Twitter net¬ 
work (Sanli and Lambiotte 2015) characterize the values of 
Lyit). While large amount of fluctuations present in inac¬ 
tive hours [0 am:6 am], the rest of the day {Ly{t)) « 1 
suggesting time-dependent Poisson processes. These results 
are preliminary, but promising since the stages of collective 
attention are clearly visible on Ly{t). 

Discussion and Future Work 

The main purpose of this paper is to establish a tool for noisy 
social time-series and uncover nonstationary features and 
temporal patterns, specifically in an online emergent limit. 
Our comparative test on the real and randomized data sets 
shows that the local variation Ly, a metric introduced to 
quantify the fluctuations of neuron spike trains as compared 
to a local characteristic time, works successfully in hashtag 
spike trains, as well. This encourages us to develop further 
tools, for instance to predict extreme online events by eval¬ 
uating the early noisy signal prior to an extreme event. As 
an example, we consider the week of the French presidential 
election-2012. This fine time window is well suitable for our 
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Figure 3: Characterizing temporal evaluation of collective attention. From left to right, the debate (May 2, 2012), a regular 
day (May 4, 2012), and the election day (May 6, 2012) are shown, (a) The local variation of Lv{t) on the topic-related 
hashtags about the debate and the election. The shaded yellow rectangle covers the debate hours and the yellow line indicates 
the announcement of the election. Significant decays in Lv{t), in left and right windows, match well with the schedule of the 
events. However, no remarkable trend is observed in the regular day (middle panel), (b) Counting tweets, including at least 
one of the hashtags addressed in the legends, per hour. The activity increase in time coincides successfully with the decays in 
Lv{t), indicating that the collective attention homogenizes the hashtag propagation and so the hashtag spike trains in this limit 
present temporal regularity. 


aim and we find that Ly is sensitive enough to distinguish 
collective attention period, users are active homogeneously 
in time, from the preceding period where temporal hetero¬ 
geneity is present and therefore a prediction would be satis¬ 
fied by performing better statistics in the decay of Lv{t). 

We obtain Lv(t) is almost 0 in rush periods. Such artifi¬ 
cial regularity originates from our assumption due to lack 
of time resolution below 1 second. Although we observe 
heterogeneity in hashtag spike trains in rush hours in the 
empirical data, uniforming spike appearance setting to 1 in 
any spike time creates unnatural homogeneity in emergent 
limit. To resolve this, the trains should be constructed pre¬ 
serving the heterogeneity in the data and so Ly must be 
re-introduced for nonuniform number of spikes at different 
spike times in a train. 
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